GH-49875: [Python] Fix timezone dropped when converting tz-aware Categorical to Arrow array#49878
Conversation
… Categorical to Arrow array
|
|
|
Hi @AlenkaF |
AlenkaF
left a comment
There was a problem hiding this comment.
I think this change makes sense but the PR is lacking a test. The reproducible example from the issue can be reused and added to https://github.com/apache/arrow/blob/main/python/pyarrow/tests/test_pandas.py mimicking existing categorical tests.
cc @jorisvandenbossche in case of any opinions.
Thank you for the review! |
|
CI failures are related, could you have a look? |
Fixed:
Please retrigger the CI |
|
|
|
@github-actions crossbow submit pandas |
|
Revision: 0f1f595 Submitted crossbow builds: ursacomputing/crossbow @ actions-3f725ba4fc |
AlenkaF
left a comment
There was a problem hiding this comment.
Thanks! The failing extended builds are not connected. Will try to see if there is an issue opened already to track that.
|
Opened an issue for the failing builds here: #49920 |
|
@jorisvandenbossche mind having one more look before I merge? |
|
After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit ea8cef5. None of the specified runs were found on the Conbench server. The full Conbench report has more details. |
Rationale for this change
When converting a pandas.Categorical with tz-aware datetime categories to a PyArrow array, the timezone information was silently dropped from the dictionary array's value type. This is a silent data loss bug — no warning or error is raised, but the timezone metadata is lost.
What changes are included in this PR?
In
python/pyarrow/array.pxi, the Categorical conversion was usingvalues.categories.values(raw numpy array)which strips timezone metadata since numpy does not support tz-aware datetimes. Changed to values.categories (pandas Index) and added from_pandas=True so PyArrow uses the pandas conversion path, which correctly preserves timezone metadata.Are these changes tested?
Yes. Verified manually
Are there any user-facing changes?
Yes — this is a bug fix. Users did #49875
This PR contains a "Critical Fix" — timezone information was lost silently during conversion without any warning or error.